# Data and Code Repository for ``Closing the publishing gender gap in economics and political science: Does a critical mass matter?''

## Overview
This repository contains the data, code, and documentation for reproducing the Tables and Figures in the paper.

## Contents
- `/data`: Dataset.
- `/code`: Script for data cleaning, analysis, and visualization.
- `/docs`: Documentation, including metadata and a detailed description of the methodology.

## Data creation and description
This dataset was created to analyze gender gaps in academic publications and citations among economists and political scientists 
at the top 50 universities worldwide, based on the QS University Ranking 2023. Data collection occurred between January and March 2024 
and followed a multi-step process:
1. Faculty Identification: Faculty names and positions were web-scraped from department websites, supplemented with manual data 
collection where needed. 
2. Gender Classification: Faculty gender was coded using the Gender API, which classifies names based on statistical analysis. An
extensive manual check followed to verify accuracy and correct errors.
3. Position Classification: A position classification based on keyword search on the position information from department websites was 
performed. This was used to filter out (missing) those not full-time or permanent faculty, i.e. excluding postdoctoral fellows, visiting 
professors, and similar roles. Also, a dummy for distinguished professor using NLP was created.
4. Google Scholar Data Retrieval: Bibliometric data, including the i10-index, total citations, and citations for the most cited 
publication, were collected from Google Scholar author pages. IDs were retrieved through automated and manual searches.
For more information on variable names, descriptions, types and value labels, see *codebook.xlsx* in `/docs`.

## How to Use
1. Navigate to the `/code` folder, and open *analysis.do*.
2. Insert the path to the analysis folder.
3. Create a `/temp` and an `/output` folder; they are meant to save temporary datasets and the figures in the paper respectively.
4. Run the code *analysis.do*, and it will store the Figures in jpeg in the `/output` folder, and display on screen the regression results for the Tables.